Skip to content

fix(dictionary_rare): remove “empress” and “empresses” words #3678

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

Kristinita
Copy link

@Kristinita Kristinita commented Mar 26, 2025

Question if the pull-request will decline

Please provide more details about the criteria for including words to the dictionary dictionary_rare.txt. I found solely information from codespell --help:

'rare' for rare (but valid) words that are likely to be errors

In my view, the dictionary dictionary_rare.txt must contain words and forms of words that used in English language in previous centuries, but almost out of use in the 21st century. Words “empress” and “empresses” doesn’t match this criterion — when now, in 2025, people speak and write about empresses, they still use the word “empress”.

Thanks.

Words “empress” and “empresses” isn’t “rare”, they are still used when we talk about empresses.

Signed-off-by: Kristinita <Kristinita@users.noreply.github.com>
@Kristinita Kristinita requested a review from peternewman as a code owner March 26, 2025 08:01
@DimitriPapadopoulos
Copy link
Collaborator

DimitriPapadopoulos commented Mar 26, 2025

The rare dictionary is for rare English words, not non-English or deprecated words.

As already explained, I believe the rare dictionary should be disabled by default - but currently it's not. In the meantime, you may simply ignore false positives in your projects.

Please provide more details about the criteria for including words to the dictionary dictionary_rare.txt. I found solely information from codespell --help:

You'll find more details scattered in existing codespell issues. You should be able to find them with a GitHub search. I think we would welcome a PR that would gather that scattered information and suggest more formal criteria.

@Kristinita
Copy link
Author

Type: Reply 💬

1. Checking the frequency of English words

The rare dictionary is for rare English words

If the phrase “rare English words” meaning “words with a low frequency in the English language”, we can check the frequency online. The list of the most widely used online English corpora. Google Books Ngram Viewer contains more words than another corpora.

2. Queries to Google Books Ngram Viewer

2.1. “empress” vs. “impress”

“empress” query to Google Books Ngram Viewer — 0.0004093305%

“empress” query to Google Books Ngram Viewer graph

“impress” query to Google Books Ngram Viewer — 0.0004247406%

“impress” query to Google Books Ngram Viewer graph

In books of 2022 on Google Books words “empress” and “impress” was found with almost the same frequency.

2.2. “Empress” vs. “Impress”

“Empress” query to Google Books Ngram Viewer — 0.0004773066%

“Empress” query to Google Books Ngram Viewer graph

“Impress” query to Google Books Ngram Viewer — 0.0000105055%

“Impress” query to Google Books Ngram Viewer graph

In books of 2022 on Google Books the word “Empress” was found 45 times more often than the word “Impress”.

Thanks.

@DimitriPapadopoulos
Copy link
Collaborator

I have to agree there has been a recent surge of occurrences of empress:
https://books.google.com/ngrams/graph?content=empress%2Cimpress&year_start=1800&year_end=2022&corpus=en

Maybe related to the current decline of democracy and the rise of authoritarian regimes around the world 😄

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants